fix: enable expandable segments for hopper+#594
Conversation
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
ashors1
left a comment
There was a problem hiding this comment.
Thank you for making this change! I think we actually want to detect compute capability when using megatron: https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/megatron_policy_worker.py#L647-L651. @SahilJain314 , correct me if I'm wrong, but it's my understanding that we only hit issues with (A100 + expandable segments) when using Megatron
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Made the change for both dtensor and megatron worker in the latest commit. |
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information